Encoding dynamics for multiscale community detection: 
Markov time sweeping for the map equation 
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The detection of community structure in networks is intimately related to finding a concise de- 
scription of the network in terms of its modules. This notion has been recently exploited by the map 
equation formalism (M. Rosvall and C. T. Bergstrom, PNAS, 105(4), pp. 1118-1123, 2008) through 
an information-theoretic description of the process of coding inter- and intra-community transitions 
of a random walker in the network at stationarity. However, a thorough study of the relationship 
between the full Markov dynamics and the coding mechanism is still lacking. We show here that 
the original map coding scheme, which is both block-averaged and one-step, neglects the internal 
structure of the communities and introduces an upper scale, the 'field-of-view' limit, in the com- 
munities it can detect. As a consequence, map is well tuned to detect clique-like communities but 
can lead to undesirable overpartitioning when communities are far from clique-like. We show that a 
signature of this behavior is a large compression gap: the map description length is far from its ideal 
limit. To address this issue, we propose a simple dynamic approach that introduces time explicitly 
into the map coding through the analysis of the weighted adjacency matrix of the time-dependent 
multistep transition matrix of the Markov process. The resulting Markov time sweeping induces a 
dynamical zooming across scales that can reveal (potentially multiscale) community structure above 
the field-of-view limit, with the relevant partitions indicated by a small compression gap. 



I. INTRODUCTION 

The analysis of biological, technical and social net- 
works has become extremely popular in recent years [1- 
3] . The availability of high dimensional relational data 
coupled with increasing computational power has set the 
ground for the investigation of complex systems from a 
network perspective, i.e., each agent or entity is viewed 
as a node interacting via multiple links with other nodes 
in the network. Such a viewpoint aims to understand the 
global emergent behavior of the system from the interac- 
tions between the individual components of the system, 
in contrast to focusing on each part on its own. 

In many cases of interest, complex networks are far 
from being unstructured and contain relevant subgroup- 
ings or communities, possibly organized into (not nec- 
essarily hierarchical) multiple levels [4]. The detection 
of such community structure can be of importance for 
the understanding of the interplay between the struc- 
tural and functional features of the network. In particu- 
lar, parts of the system operating on given scales could 
be represented with a simplified description at an appro- 
priate level of coarse graining. 

Community detection methods based on a variety 
of heuristics (including modularity [5, 6] and spectral 
partitioning methods [7-10] among many others — see 
Refs. [1, 11] for recent reviews) have been proposed to 
find an optimized split into communities. The communi- 
ties thus found result from identifying groups with high 
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intra-community weights as compared to the expected 
weights in surrogate models of the network. In adopt- 
ing such a structural criterion, these methods introduce 
an intrinsic scale that establishes limits on the commu- 
nities they can detect, thus leading to potential misde- 
tcction [12]. Furthermore, such single scale methods are 
not suitable for the analysis of networks in which there 
is not a single 'best' mesoscopic level of description, but 
rather multiple levels associated with different scales in 
the system [13]. 

In order to account for the presence of multiple lev- 
els of organization, multiscale methods have been intro- 
duced that allow to search for the right scale at which 
the network should be analyzed [14-17]. Recently, it 
has been shown that one can use the time evolution of a 
Markov process on the graph to reveal relevant commu- 
nities at different scales in a process of dynamic zoom- 
ing through the so-called partition stability [12, 18, 19]. 
As the Markov time increases, the diffusive process in- 
volves multistep transitions and explores further afield 
the structure of the graph, resulting in the detection of 
community structure across scales, from finer to coarser. 
This dynamic approach has the advantage that it pro- 
vides a unifying framework for structural community de- 
tection methods (such as modularity and spectral meth- 
ods), which can be seen as particular cases of this ap- 
proach involving one-step measures. 

A different perspective is provided by an information 
theoretic framework that considers the problem of find- 
ing communities in a network coding or compres- 
sion problem [20-23]. The underlying idea is that the 
presence of communities should imply the existence of 
an efficient and concise way to encode the behavior of 
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a system in terms of its subgroups. Recently, the map 
equation method by Rosvall et al [21, 22] relies on a com- 
pression of the description length of a random walk inside 
and between communities to find good graph partitions. 
This method has received a lot of attention, since it has 
been shown to be extremely efficient on benchmark tests 
[24] outperforming the popular modularity [5, 6]. It has 
also been shown to be immune to the resolution limit [25] 
that affects the performance of modularity. However, the 
mathematical properties and possible limitations of the 
map equation remain relatively unexplored. 

Here we show that the map equation can also be under- 
stood as a one-step method and, consequently, it suffers 
from an upper scale (the field-of-view limit) above which 
it cannot detect communities [12]. This limited field-of- 
view can lead to overpartitioning when communities are 
far from being clique-like [12]. In addition, the one-step 
map coding scheme also neglects the internal structure 
of the communities and, in doing so, introduces a bias 
towards communities that are locally fast mixing (and 
in this sense clique-like). We also show that the qual- 
ity of the map partitioning can be assessed through the 
existence of a small compression gap, i.e., a small dis- 
tance between the compression achieved by Map and its 
theoretical limit given by the true entropy rate of the 
Markov process. To alleviate some of these limitations, 
we introduce a dynamical approach that introduces time 
explicitly into the map coding scheme, by considering 
the time-dependent multi-step transition matrix of the 
Markov process on the network as the object of the map 
encoding. This introduces a dynamic zooming by sweep- 
ing through the Markov time, which allows the detection 
of multiscale community structure with the map equation 
formalism. 



II. COMMUNITY DETECTION FROM A 
CODING PERSPECTIVE: THE MAP EQUATION 

The map formalism considers the problem of partition- 
ing a network into non-overlapping communities from 
a coding perspective. The original map formalism [21] 
equates the quality of the partition to the efficiency of 
a code that would describe the notional transitions of 
a random walker inside and between communities. The 
Infomap algorithm can then be used to obtain good par- 
titions through the optimization of this quality function. 
The underlying principle is that the code for such one- 
step transitions of the random walker can be efficiently 
compressed in the presence of a strong community struc- 
ture: short names for nodes (codewords) can be reused 
in different communities, much like street names can be 
reused in different cities of a country [21]. In the original 
map equation, the movement of the walker is described 
in terms of two kinds of codebook. The first kind of code- 
book is specific to each community and assigns a unique 
codeword for each node inside it and a particular exit 
codeword for the community. An additional codebook 



contains unique codewords that describe the movements 
between different communities. More recently, a hierar- 
chical extension of the map formalism (a recursive version 
of the original method) has been presented [22] as well 
as an extension for overlapping modules [26]. We do not 
consider these extensions in detail here, as both methods 
are based on the same principles of the standard map 
equation and our findings are applicable to these as well. 

A. Definitions and notation 

An explicit rewriting of the original map formalism in 
terms of the stationary distribution of a random walk is 
as follows. Consider a discrete time Markov process on a 
graph with N nodes: 

Pfc+i = Pit D~ X A = pk M, (1) 

where pk is the 1 x ]V (node) probability vector, A is 
the (weighted) adjacency matrix of the graph, D is the 
diagonal matrix containing the (weighted) degree of each 
node, and we have also defined M, the transition matrix 
of the random walk. The stationary distribution of the 
random walk, w, is then given by: 

tx = ttM. (2) 

Consider now a partition of the network into c com- 
munities indexed by a = 1, . . . , c. At stationarity, the 
probability of leaving community a (or of arriving at 
community a) is 

and the overall probability of changing community is 

c 

a=l 

Similarly, the probability to stay within or to leave com- 
munity a is 

The map equation then defines the per-step description 
length of a code associated with this partition as: 

c 

LM = Y,Po H ( va ) +Q~H{Q), (3) 

a = l 

a weighted combination of the Shannon entropies: 

^ir-(t)-£S'-(i) 

H{Q) = -j2 — ^ g J q — V 
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(a) 




Figure 1. (Color online) Equivalent graph partitions for 
the map equation. Because the map equation ignores the 
specific connectivity of the graph, graph partitions with equal 
equilibrium and leaving probabilities become indistinguish- 
able to Map. Different communities are represented by dif- 
ferent colors (shades of gray) . Unless indicated, the weight of 
the edge is 1. (a) Two graphs with different intra-community 
connectivity but the same map coding length, Lm- (b) Two 
graphs with different inter-community connectivity and the 
same Lm. (c) Two graphs with equal Lm but very different 
inter- and intra-community connectivity. From the viewpoint 
of Map, a ring-of-rings is equivalent to a clique-of-cliques with 
different weights. 

The two terms in Eq. (3) correspond to two classes of 
codcbooks that encode one-step transitions at station- 
arity viewed through the prism of the given partition. 
The first term stems from the "community-centric" code- 
books with probability distributions V a (and associated 
entropy) of being at or leaving from each of the com- 
munities. The second term corresponds to the "inter- 
community" codcbook with distribution Q (and associ- 
ated entropy) of changing community. 

In the original map formalism it is proposed that a low 
Lm is a characteristic of good partitions and the Infomap 
algorithm is used to search computationally for partitions 
with low Lm- 

III. MAP ENCODES BLOCK-AVERAGED, 
ONE-STEP TRANSITIONS: IMPLICATIONS FOR 
COMMUNITY DETECTION 

As shown by the definitions above, the original map 
equation does not fully code for the dynamics of the 



Markov process (1), as it only uses quantities derived 
from block-averaging of one-step transitions at station- 
arity. The simplifications involved in block-averaging 
the structure and in ignoring longer-term dynamics both 
have inter-related implications for community detection, 
which we now study in detail. 



A. Block-averaging the connectivity: the 
compression gap and a bias towards over-fitting to 
clique-like communities 

An examination of the terms in the map equation (3) 
reveals that the implicit block-averaging neglects the in- 
ternal structure of the communities as well as the detailed 
inter-community connectivity. More precisely, given a 
particular partition, all graphs with the same equilibrium 
distribution 7r and overall leaving probabilities q an will 
be indistinguishable in terms of their map quality, Lm, 
as exemplified in Figure 1. 

From the viewpoint of entropies, the map equation (3) 
is formally equivalent to a weighted sum of the en- 
tropies of i.i.d. stochastic processes with states visited 
according to: normalized "community-centric" probabil- 
ities {{'Ki/p%}ieoii 9arv/Po}a=i' an< ^ normalized "leav- 
ing" probabilities {q ar ±/ <lr^}%—i, respectively. Alterna- 
tively, this procedure may be seen as formally equiva- 
lent to using a block-averaged transition matrix corre- 
sponding to a block-structured weighted (and in general 
directed) complete graph with self-loops. Consequently, 
Infomap exhibits a bias towards identifying communities 
that are formally equivalent to clique-like subgraphs. 

In this sense, the map equation can be seen to code for 
a two-level, mean-field organization: one inside commu- 
nities, one across communities. Such block-structured, 
all-to-all models are a good representation of commu- 
nity structure based on hierarchical cliques-of-cliques. In- 
deed, the map equation performs well in block-structured 
Erdos-Rcnyi benchmarks [24] and is not afflicted by the 
'resolution limit' [25]. On the other hand, there are im- 
portant networks with a more marked local structure in 
which communities are not clique-like [12]. Because the 
map formalism has not been designed to detect such non 
clique-like communities with large effective distances, In- 
fomap will tend to ovcrpartition such networks. 



Ignoring the detailed connectivity: the compression gap of 
the map equation 

The fact that Map ignores the detailed connectivity in- 
side and outside the communities leads to a sub-optimal 
coding scheme. This sub-optimality can be quantified 
through the compression gap (defined below), which can 
be used as a measure of when the Map block-averaging 
assumptions are a valid simplification for the network 
under study. 

Consider the Markov chain with transition matrix M 
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and stationary distribution tt, as given in Eq. (1). The 
most efficient coding of the dynamics of the associated 
Markov process at stationarity is bounded from below 
by the entropy rate h [27, 28]: 

h(x; M) = - nMij log 2 (A%). (4) 

ij 

The corresponding optimal encoding can be asymptoti- 
cally achieved by endowing each node with a dictionary 
for its outgoing links, as shown by Shannon [27]. This is 
a kind of 'edge encoding.' 

On the other hand, if we consider a coding scheme that 
gives each node a unique name within the whole graph, 
(i.e., a 'node encoding'), then the corresponding coding 
length is bounded by the entropy rate of the i.i.d. random 
variable with probability distribution tt, which is equal to 
the entropy of the stationary distribution: 

H(ir) = -Ynlog 2 (n)- (5) 

i 

The map coding scheme can be seen as a mixture of 
both: it encodes nodes uniquely within communities, but 
encodes for transitions ('edges') between communities. 
Therefore, in general, 

h{-K;M)<L M <H(ir), (6) 

and Map is sub-optimal in terms of its coding length [29] , 
as recognized by Rosvall and Bergstrom in their original 
publication [21]. This sub-optimality can be understood 
with a simple example: consider a community a ou t from 
which there is only one possible link to another commu- 
nity a to . Map encodes this transition with two code- 
words: an exit codeword to signal the leaving of a out 
and a codeword to identify the destination community 
ato- Clearly, the second codeword is redundant. 

Importantly, if the graph is a weighted, directed clique 
(i.e., with transition matrix M = l7r), then h(-K]M) = 
H(ir) and the two coding schemes ('edge' and 'node') 
give the same result (see Figure 2(a)). Therefore, the 
sub-optimality of the map coding is minimal when the 
graph is close to a clique. Consequently, the minimiza- 
tion of the map cost function is well suited to identify 
community structure that is a clique of cliques: within 
each community Map uses a 'node' encoding while be- 
tween communities Map encodes transitions by default. 
In such a scenario, the map coding scheme is nearly op- 
timal and close to the entropy rate. 

The sub-optimality of the map encoding plays a sig- 
nificant role when encoding communities with restricted 
connectivity. For instance, if the community is a ring, 
a random walker has only two possible nodes to transi- 
tion to, instead of assumed by Map. In this case, 
there is a large gap between the map description length 
Lm and the optimal limit established by h(ir;M), indi- 
cating that the full consideration of the graph structure 
in the Markov dynamics could be exploited for a better 
encoding (sec Figure 2(b) for an example). 




Example Sequence: Example Sequence: 

...AACBDCCAB... ...EFGHEFGHE... 

Entropy rate h = 2 bits/step Entropy rate h = bits/step 
Map encoding l_ M = 2 bits/step Map encoding l_ M = 2 bits/step 

Figure 2. The compression gap of the map coding 
scheme, (a) For a clique with self loops, the map coding 
scheme is optimal (assuming no communities) and is equiv- 
alent to a uniform i.i.d. process with four states, (b) In a 
directed cycle, the map coding is far from optimal. The move- 
ment of a random walker on this graph can be encoded by just 
denoting the starting position but the map coding scheme en- 
forces unique names for each node and thus requires at least 
2 bits/step. 

This discussion highlights the fact that the block- 
averaging implicit in the original map scheme leads to a 
sub-optimality of the proposed map coding scheme that 
becomes significant when the network cannot be well de- 
scribed as a cliquc-of-cliqucs. In order to quantify this 
effect, we define the compression gap, S: 

S = (L M - h)/h, (7) 

which measures how close the map encoding is to opti- 
mality. Note that other measures for the compression 
gap, such as S' = (Lm — h)/(H — h), could be used and 
may be more suitable, or sensitive, in some cases. In 
this manuscript we stick mostly to the slightly simpler 
expression of <5, as it is sufficient for our purposes. The 
compression gap can be used to establish when the com- 
munities identified by Map are far from being clique-like 
and hence serves as an indicator of the reliability of the 
partitions obtained by Infomap, as shown below. 



B. One-step transitions: the fleld-of-view limit and 
a bias towards overpartioning of non clique-like 
communities 

As discussed above, the original map formalism is 
based on an implicit clique-like concept of community, 
and a community structure as a (statistical) clique of 
cliques. Although this model has proved successful in 
a variety of fields [24, 30], relevant technological, social 
and biological networks are far from being clique-like [12]. 
In such cases, Infomap might tend to overpartition com- 
munities as a result of an upper scale (the 'field-of-view 
limit') which stems from map encoding only for one-step 
transitions at stationarity. This field-of-view limit af- 
fects all one-step methods, including not only Map but 
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also modularity. The ficld-of-view occurs on the oppo- 
site end of the well-known resolution limit that appears 
as a lower scale for modularity [25] but does not seem to 
impact Map [17]. 



Over-partitioning of lattice-like graphs 

The overpartitioning induced by the field-of-view limit 
can be understood analytically through the following 
simple examples of lattice-like graphs. 

First, consider a cycle graph of length TV with un- 
weighted edges. The equilibrium distribution of the ran- 
dom walk on this graph is Tii = 1/N, i = 1, . . . , N. This 
graph has no community structure and the only relevant 
partition should be the global "all-in-one" . 

For a partition of the ring into c > 2 communities 
indexed by a we have: 



{fcr> = 1/N, Va; 



c/N; pg = (n a + 1)/N} 



where n a is the number of nodes in community a and, 
clearly, Yla=i Ua = ^ ■ The map cost function of this 
partition is 



/c-cycle into equally-sized communities of size N/c* given 
by: 



hi 



2N 
fc + 1 



2N 



c*(k + l) 



1 



c* > 2. 



(11) 



The same reasoning can be applied to a torus net- 
work, i.e., the cartesian product of two cycles of lengths 
R and r, with N = rR. This graph can be thought of as 
the discretization of a 2-dimensional lattice with periodic 
boundary conditions. It is easy to show that the optimal 
radially symmetric partition of the graph (with R > r) 
is into communities of size N / c* with c* given by: 



ln(2i?- 



2R 

c* 



1 



(12) 



Therefore, as the size of the lattice N increases, Infomap 
will partition the torus into smaller sections. Our nu- 
merical exploration shows that the above solution is a 
conservative estimate and the overpartitioning induced 
by Map is even more acute for the torus: As N grows, 
other even smaller patch-like partitions are obtained by 
the Infomap optimization. 



L M{{n a } c a= i) = log 2 (c) 



+ 1)- 



N 



(8) 

Using convexity arguments, it is easy to show that for a 
given N and c, the minimal Lm is attained for the parti- 
tion with equally-sized communities with n a = N/c, Va, 
if it exists. For such a partition, the map equation (8) 
becomes: 

L M ({N/c} c a=1 ) = (l + log 2 (iV + c) - log 2 (c), (9) 

with c > 2. The case c = 1 is the trivial "all-in-one 
partition" with Lm({N}q = i) = log 2 N. 

The relevant Map optimization for the cycle graph of 
size N is then equivalent to finding which of the equal 
partitions into c communities has the lowest Lm- 

unnL M {{N/cY a=1 ). 

c 

Assume N/c to be real to facilitate the analysis, a relax- 
ation which our numerics show not to affect the result. 
Then the partition with minimal Lm has equal commu- 
nities of size N/c* with c*(N) given by: 



N 

ln(N + c*) = --l 



c* > 2. 



(10) 



It is easy to show that, for a long enough ring, such a 
partition will have lower Lm than the 'all-in-one' parti- 
tion. Indeed, the map equation partitions all cycles with 
N > 10. 

Similar results are obtained for the regular fc-cycles 
used as the starting point for the small-world construc- 
tion (see Section VC). In this case, Map partitions the 



IV. A DYNAMICAL ENHANCEMENT OF THE 
MAP SCHEME: MARKOV TIME SWEEPING 
FOR THE MAP EQUATION 

As discussed above, the original map equation does 
not fully account for the dynamics of the Markov pro- 
cess (1), as it only uses quantities derived from block- 
averaged one-step transitions. Such a simplification is 
reasonable for clique-like communities, which exhibit a 
small compression gap and can be fully explored in one 
step. However, networks of interest sometimes possess 
a multi-scale, non clique-like community structure which 
will go unrecognized by the original map equation due to 
its intrinsic bias towards cliques and the ensuing field-of- 
view limit. 

The limitations of the map equation in such scenar- 
ios can be overcome by adopting concepts from partition 
stability, a recently introduced dynamical framework for 
community detection [12, 18, 19]. The idea is to consider 
the time evolution of the Markov process as a means to 
unfolding systematically the graph structure at different 
scales. This Markov time sweeping, which is equivalent 
to considering multi-step transitions, applies a natural 
zooming process (from small to large scales) to the net- 
work. A key aspect of this approach is the systematic 
sweeping across scales provided by the dynamics, which 
minimizes the effects of the resolution and ficld-of-view 
limits. For an extended discussion, see [12, 18, 19, 31]. 

This Markov time sweeping can be used to endow the 
map equation with a dynamic zooming that allows it 
to detect multi-scale community structure, with relevant 
partitions characterized by a low compression gap. For 
simplicity, consider the continuous version of the Markov 
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process (1) associated with a graph with adjacency ma- 
trix A on TV nodes: 

p = -pD- 1 L 1 (13) 

where p is a 1 x N vector of probabilities, D is the di- 
agonal matrix containing the weights of each node and 
L = D — A is the graph Laplacian. It is easily veri- 
fied that this continuous-time Markov process has the 
same stationary distribution as the discrete-time random 
walk (1) [18, 19]. 

The analytical solution of this system leads us to con- 
sider the discrete-time process: 

Pfc+1 = Pfc T(t), (14) 

where 2~y(i) = [e~ tD lL ]ij is the effective transition 
probability between nodes i and j after a (Markov) time 
t. Within this framework, it is easy to see that the orig- 
inal map formulation considers the linearized version of 
T(t) evaluated at time t — 1. Consequently, the original 
map equation scheme is included as a particular case in 
our formulation and we can always recover the standard 
Map results under our scheme [32]. 

Our approach is then to use the map equation to 
analyze the community structure of the time- dependent 
weighted network D T(t) as a function of the (Markov) 
time, t. As time grows, the transition matrix T be- 
comes less sparse and more clique-like, yet in a struc- 
tured manner that reflects the community structure of 
the network [19]. Consequently, the leaving probabilities 
9/3rv(*) = Yliea^jgcKiTijit) increase with increasing 
time; the cost for encoding distinct communities increases 
too; and map tends to find coarser communities that can 
be better represented as cliques. More specifically: 

• For t —> 0, the leaving probabilities go to zero and 
the map equation is minimized by setting each node 
in its own community, as can be easily verified. 

• For t — > oo, we approach the limit of an i.i.d. ran- 
dom process, i.e., T(t) —t lir, where 1 is the vector 
of ones. In this limit, the map encoding for the 
"all-in-one" partition is optimal, since it results in 
a description length which is equivalent to the en- 
tropy rate. More precisely, it is easy to see from 
Eq. (6) that 5(t) -> as t -> oo. 

• For intermediate times, the Markov time acts as c\ 
natural resolution parameter and the partitions of 
the time-dependent weighted graph D T(t) become 
increasingly coarser. By following the time evolu- 
tion, we can check whether a particular partition 
corresponds merely to a transient or whether it is 
persistent for a range of times. 

Furthermore, the compression gap (7) can be used as 
an information-theoretic indicator of the reliability of the 
partitions found by Infomap at different Markov times. 
As discussed above, a low 5 is expected when the par- 
tition reflects a community structure close to that of a 



clique of cliques, thus conforming to the assumptions un- 
derlying the map formalism. Therefore, low values of 
S(t) can be used to indicate relevant map partitions and 
also to identify the existence of a multi-scale community 
structure in the network. 

This Markov time sweeping brings to the map equation 
what the partition stability offers to modularity [18, 19]; 
namely, the possibility to use time as a means to scan 
naturally through the resolution of community detection 
(from fine to coarse) in a manner that is consistent with 
the Markov dynamics on the graph. From this dynamical 
viewpoint, the standard map equation corresponds to a 
time-snapshot of the diffusion dynamics. Furthermore, 
this dynamical approach is a natural framework for the 
map scheme, since it introduces a time-dependent but 
finite probability of jumping from any node to any other 
node at all times, in line with the formalism underpinning 
the map equation. 



V. SOME ILLUSTRATIVE EXAMPLES 

In this section, we illustrate the use of Markov sweep- 
ing map with simple examples. The procedure is as 
follows: For each Markov time, we construct the time- 
dependent network defined by D T(t). We then optimize 
the (time-dependent) map cost function using the imple- 
mentation of Infomap for directed graphs found online at 
http : //www . tp . umu . se/~rosvall/, slightly modified to 
enable self-loops in the graphs. We only consider here 
undirected networks but the method can be extended 
easily to directed graphs when we allow for tclcporta- 
tion [33]. For all examples, 100 runs of the Infomap algo- 
rithm at each Markov time were used to find the optimal 
partition. 



A. A network without community structure: the 
cycle graph 

As a first example, we apply Markov time sweeping to 
the ring network discussed in Section IIIB. Recall that 
for the cycle graph with N = 20 nodes, our analytical 
arguments show that the original map scheme leads to a 
non-intuitive partition into 5 equal communities, instead 
of the expected 'all-in-one' partition. However, the high 
compression gap of the 5- way partition found by the stan- 
dard Map (5 w 2.48) confirms that this partition is far 
from being formed by clique-like communities. Because 
standard Map is being applied to a network which does 
not conform to the implicit assumptions about commu- 
nity detection in the original map framework, we see an 
overpartitioning in this case. 

As seen in Figure 3, analyzing this network with the 
Markov-sweeping version of Map reveals that there is no 
significant community structure in this graph. Only the 
singleton partition (at very short times) and the global 
partition (at very long times) provide significant group- 
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Markov time 



Figure 3. (Color online) Markov sweeping map for a cy- 
cle graph with N = 20. As the Markov time increases, the 
map partitioning goes from the finest possible partition to the 
global 'all-in-one' partition (solid blue line). However, as indi- 
cated by the featureless decay of the compression gap 8 with 
no clear minima (dashed green line), no other relevant com- 
munity is found between those two extreme partitions, thus 
signaling the lack of community structure. In this case, opti- 
mizing the standard map equation finds 5 communities but a 
large compression gap 8 « 2.48 indicates that this partition 
is unreliable. Inset: analyzed graph. 

ings of the nodes while all other partitions show high 
values of S. 

B. A simple network with multi-scale community 
structure 

Consider now a weighted graph with a distinct hier- 
archical community structure: two triangles of triangles 
with weighted links to reinforce the hierarchy (see inset 
of Figure 4). In this example, the standard map equation 
method identifies the fine structure of six small triangles. 
(We note that the hierarchical map equation uncovers 
the two-tier hierarchy of communities in this graph.) 

Our proposed Markov sweeping map also recovers the 
hierarchy of partitions across time-scales, as indicated by 
the sharp decreases in the compression gap 5 when the 
six-fold and the two-fold partition are detected (Figure 
4). Our method also indicates over which timescales the 
relevant partitions appear to be natural. For instance, a 
change in the weights would induce changes in the lengths 
of the plateaux corresponding to the different levels of the 
hierarchy. As stated above, hierarchical Map is able to 
resolve this clique-like community structure (while stan- 
dard Map finds only the fine structure). However, if the 
multi-scale structure is not clique-like, hierarchical Map 
may fail to resolve the multi-scale structure, as shown in 
the network of small- world communities discussed in the 
next section. 



C. A ring of small- world communities 

As a next scenario, we study a ring of five weakly con- 
nected small-world graphs [34] of 200 nodes each, as in- 
troduced in [12] (see Figure 5(a)). We use the CON- 
TEST toolbox [35] to generate small-world communities 




Markov time 



Figure 4. (Color online) Markov sweeping map for a 
graph with a hierarchical community structure. The 

graph analyzed (inset) has a clear community structure given 
by a hierarchy of triangles: the six smaller triangles (denoted 
by different colors) have edges within them of weight 100; they 
are grouped into two larger triangles with weaker links (the 
edges between the 6 small triangles have weight 10); the edge 
between the two big triangular structures has weight 1. The 
compression gap (dashed green line) shows two clear minima, 
indicating well defined partitions into 6 and 2 communities, 
corresponding to the two tiers of the hierarchy. Standard In- 
fomap finds only the 6 small triangles (8 w 0.62). 



by adding random connections a la Newman- Watts [36] 
starting from a pristine world with two nearest neigh- 
bours [37] but allowing for the possibility of multiple 
shortcuts at each node. 

As discussed in Section IIIB, the standard map equa- 
tion will tend to overpartition lattice- like structures, such 
as the pristine worlds (fc-cycles) used as starting point for 
the small- world construction. As given by Eq. (11), stan- 
dard Infomap partitions the pristine world with N = 200 
and k = 2 into 22 equally-sized communities. 

This overpartitioning persists when few random short- 
cuts are added, as shown in Figure 5(b). Only when the 
average number of added shortcuts per node, here de- 
noted by s, is greater than 3.5 (and the mean distance 
within the small-world has become small) does standard 
Map obtain the right split into five communities. This 
is consistent with our discussion pertaining the field-of- 
view, i.e., the smaller the mean path length, the more 
clique-like the structure. In this case, hierarchical In- 
fomap can even give a non-intuitive partition into 4 com- 
munities, due to the non clique-like nature of the commu- 
nities. On the other hand, Figure 5(c) shows that Markov 
sweeping allows Map to detect the relevant partition into 
5 communities over an extended time-scale with a small 
compression gap. 



VI. DISCUSSION 

A key insight to emerge from the map equation for- 
malism is the fact that a coarse-grained description of a 
graph in terms of its communities is intimately related 
to finding concise descriptions of the information flow 
on these networks, and hence to the field of coding the- 
ory and data compression. However, the adoption of a 
coding or compression mechanism has important effects 
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Figure 5. (Color online) Community detection in a ring 
of small-world communities, (a) Ring of 5 small-world 
communities with N = 200 each. The edges within the small- 
worlds have weight 5 while the weight of the links between 
them is 1. All the small- worlds have an average number of 
randomly added shortcurts per node, s. For s = 1 (shown), 
standard Map shows a strong overpartitioning leading to an 
average of 16 communities inside each small-world (indicated 
by different colors in online version), (b) Number of commu- 
nities found by standard Map vs. mean pathlength inside the 
small-world communities. The numerics shown correspond to 
10 different realizations of the network with average number 
of shortcuts per node, s = 1, 1.25, 1.5, . . . , 3.75, 4. (c) Apply- 
ing Markov sweeping map to the ring of small-worlds with 
s — 2.5 (mean pathlength inside the small-worlds ~ 2.7) finds 
the relevant partition into 5 communities, while standard Map 
finds 23 communities in this case. 



on the outcome of the algorithm and ultimately reflects 
the underlying assumptions about the concept of commu- 
nity. Here we have shown that the original map equation 
formalism is inherently tuned towards a block-averaged 
notion of community structure as a weighted, statisti- 
cal clique of cliques. This tuning stems from two inter- 
related simplifications: the block-averaged coding mech- 
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Figure 6. (Color online) Community detection with het- 
erogeneously sized subgraphs, (a) Ring of 6 cliques with 
different sizes {10,15,20,25,30,40}. Upper panel: number 
of communities found by Markov sweeping map vs. Markov 
time. Lower panel: both the compression gap 8 (dashed green 
line) and the alternative compression gap measure 8' (red 
dashed-dotted line), clearly highlight the presence of a ro- 
bust partition into 6 communities. Inset: analysed graph, 
(b) Ring of 6 rings with different sizes {10, 15, 20, 25, 30, 40}. 
In this case the alternative measure 8' for the compression 
gap is better suited for the analysis, indicating the presence 
of the 6 rings by a relative minimum around Markov time 30. 
Inset: analysed graph. 



anism, which ignores the detailed connectivity and ex- 
hibits a large compression gap for non-clique structures, 
and the use of one-step quantities, which ignores the ef- 
fect of multi-step flows in the communities and leads to 
an upper scale (field-of-view) for detection. This intrin- 
sic bias of the map equation explains the excellent per- 
formance of the map equation in clique-like benchmarks 
but can lead to unexpected overpartitioning of networks 
if they differ strongly from the assumed clique-like orga- 
nization. 

Wc have shown that using the dynamical zooming 
provided by Markov time sweeping allows one to take 
into account multi-step flows and scan across all scales 
in a natural manner. The underlying idea is that, as 
time increases, the communities in the network will be- 
come more clique-like when analyzed through the time- 
dependent weighted transition matrix of the Markov pro- 
cess. Therefore, the map formalism can be used to detect 
long-range communities as the Markov time increases, 
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and the relevant communities will be signaled by a low 
compression gap. This Markov sweeping for the map 
equation can enhance the performance of the method 
by allowing it to detect non-clique communities and the 
presence of multi-scale community structure in networks. 
Importantly, the method still recovers all the results from 
the original map equation. 

As stated above, the dynamic zooming across all scales 
effected by the Markov process is an integral ingredient of 
the method. Rather than just looking for the 'right' scale, 
the community structure emerges from the integration 
of the information gathered systematically at all scales. 
This approach can help alleviate the reliance on a global 
scale which can affect the results when dealing with net- 
works with communities with very heterogeneous sizes 
[38]. In particular, Markov-sweeping map is able to de- 
tect heterogeneous cliques as obtained through the LFR 
benchmark, a fact consistent with the notion that cliques 
are all effectively one-step and that standard Map already 
performs effectively on such benchmarks (see also Fig- 
ure 6 for an analysis with heterogeneously sized cliques) . 
Similarly our method performs well in detecting commu- 
nities in a ring of rings with very dissimilar sizes as illus- 
trated in Figure 6, although when the heterogeneity of 
the relative ring sizes becomes very large, our approach 
will not identify all rings at once at the same level of the 
hierarchy. To improve further the applicability of the 
method to such problems, one can use different dynam- 
ics for the Markov process [19, 39]. This is an area of 
research we are currently pursuing. However, since there 
is no community detection algorithm that will serve all 
purposes for all possible applications, one should comple- 
ment the analysis with other methods based on different 
principles (e.g., local algorithms in those cases). 

Adding a dynamical dimension to Map through 
Markov sweeping is just one of the possible ways to en- 



hance the map equation and alternative approaches are 
worth pursuing. One direction would be the modification 
of the coding scheme. For instance, a more rigorous treat- 
ment would require to remove the constraint of having 
unique codewords within each community and allow also 
for encoding of walks instead of single step codewords. 
This generalization, however, would most likely lead to 
a breakdown of the simple coding picture that underpins 
the map equation. Our work emphasizes the importance 
of the choice of dynamics on the network and shows that 
using a dynamical perspective may lead to a more natural 
framework for community detection, especially when the 
underlying system has an inherent flow. In this paper, 
we have used the standard (unbiased) continuous-time 
random walk as a neutral first choice of dynamics. How- 
ever, other continuous time or discrete time processes are 
possible (see also [19] for a related discussion) in order 
to tune our community detection algorithm to different 
characteristics of the network. 

Code is available online [40]. 
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